Search CORE

98 research outputs found

A multi-tier cached I/O architecture for massively parallel supercomputers

Author: García Blas Francisco Javier
Publication venue
Publication date: 25/05/2010
Field of study

Recent advances in storage technologies and high performance interconnects have made possible in the last years to build, more and more potent storage systems that serve thousands of nodes. The majority of storage systems of clusters and supercomputers from Top 500 list are managed by one of three scalable parallel file systems: GPFS, PVFS, and Lustre. Most large-scale scientific parallel applications are written in Message Passing Interface (MPI), which has become the de-facto standard for scalable distributed memory machines. One part of the MPI standard is related to I/O and has among its main goals the portability and efficiency of file system accesses. All of the above mentioned parallel file systems may be accessed also through the MPI-IO interface. The I/O access patterns of scientific parallel applications often consist of accesses to a large number of small, non-contiguous pieces of data. For small file accesses the performance is dominated by the latency of network transfers and disks. Parallel scientific applications lead to interleaved file access patterns with high interprocess spatial locality at the I/O nodes. Additionally, scientific applications exhibit repetitive behaviour when a loop or a function with loops issues I/O requests. When I/O access patterns are repetitive, caching and prefetching can effectively mask their access latency. These characteristics of the access patterns motivated several researchers to propose parallel I/O optimizations both at library and file system levels. However, these optimizations are not always integrated across different layers in the systems. In this dissertation we propose a novel generic parallel I/O architecture for clusters and supercomputers. Our design is aimed at large-scale parallel architectures with thousands of compute nodes. Besides acting as middleware for existing parallel file systems, our architecture provides on-line virtualization of storage resources. Another objective of this thesis is to factor out the common parallel I/O functionality from clusters and supercomputers in generic modules in order to facilitate porting of scientific applications across these platforms. Our solution is based on a multi-tier cache architecture, collective I/O, and asynchronous data staging strategies hiding the latency of data transfer between cache tiers. The thesis targets to reduce the file access latency perceived by the data-intensive parallel scientific applications by multi-layer asynchronous data transfers. In order to accomplish this objective, our techniques leverage the multi-core architectures by overlapping computation with communication and I/O in parallel threads. Prototypes of our solutions have been deployed on both clusters and Blue Gene supercomputers. Performance evaluation shows that the combination of collective strategies with overlapping of computation, communication, and I/O may bring a substantial performance benefit for access patterns common for parallel scientific applications.-----------------------------------------------------------------------------------------------------------------------------En los últimos años se ha observado un incremento sustancial de la cantidad de datos producidos por las aplicaciones científicas paralelas y de la necesidad de almacenar estos datos de forma persistente. Los sistemas de ficheros paralelos como PVFS, Lustre y GPFS han ofrecido una solución escalable para esta demanda creciente de almacenamiento. La mayoría de las aplicaciones científicas son escritas haciendo uso de la interfaz de paso de mensajes (MPI), que se ha convertido en un estándar de-facto de programación para las arquitecturas de memoria distribuida. Las aplicaciones paralelas que usan MPI pueden acceder a los sistemas de ficheros paralelos a través de la interfaz ofrecida por MPI-IO. Los patrones de acceso de las aplicaciones científicas paralelas consisten en un gran número de accesos pequeños y no contiguos. Para tamaños de acceso pequeños, el rendimiento viene limitado por la latencia de las transferencias de red y disco. Además, las aplicaciones científicas llevan a cabo accesos con una alta localidad espacial entre los distintos procesos en los nodos de E/S. Adicionalmente, las aplicaciones científicas presentan típicamente un comportamiento repetitivo. Cuando los patrones de acceso de E/S son repetitivos, técnicas como escritura demorada y lectura adelantada pueden enmascarar de forma eficiente las latencias de los accesos de E/S. Estas características han motivado a muchos investigadores en proponer optimizaciones de E/S tanto a nivel de biblioteca como a nivel del sistema de ficheros. Sin embargo, actualmente estas optimizaciones no se integran siempre a través de las distintas capas del sistema. El objetivo principal de esta tesis es proponer una nueva arquitectura genérica de E/S paralela para clusters y supercomputadores. Nuestra solución está basada en una arquitectura de caches en varias capas, una técnica de E/S colectiva y estrategias de acceso asíncronas que ocultan la latencia de transferencia de datos entre las distintas capas de caches. Nuestro diseño está dirigido a arquitecturas paralelas escalables con miles de nodos de cómputo. Además de actuar como middleware para los sistemas de ficheros paralelos existentes, nuestra arquitectura debe proporcionar virtualización on-line de los recursos de almacenamiento. Otro de los objeticos marcados para esta tesis es la factorización de las funcionalidades comunes en clusters y supercomputadores, en módulos genéricos que faciliten el despliegue de las aplicaciones científicas a través de estas plataformas. Se han desplegado distintos prototipos de nuestras soluciones tanto en clusters como en supercomputadores. Las evaluaciones de rendimiento demuestran que gracias a la combicación de las estratégias colectivas de E/S y del solapamiento de computación, comunicación y E/S, se puede obtener una sustancial mejora del rendimiento en los patrones de acceso anteriormente descritos, muy comunes en las aplicaciones paralelas de caracter científico

Universidad Carlos III de Madrid e-Archivo

A generic I/O architecture for data-intensiveapplications based on in-memorydistributed cache

Author: Carretero Pérez Jesús
García Blas Javier
Rodrigo Duro Francisco José
Publication venue
Publication date: 01/01/2016
Field of study

Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016) Timisoara, Romania. February 8-11, 2016.The evolution in scientific computing towards data-intensive applications and the increase of heterogeneity in the computing resources, are exposing new challenges in the I/O layer requirements. We propose a generic I/O architecture for data-intensive applications based on in-memory distributed caching. This solution leverages the evolution of network capacities and the price drop in memory to improve I/O performance for I/O-bounded applications adaptable to existing high-performance scenarios. We have showed the potential improvementsEuropean Cooperation in Science and Technology. COSTThis work is partially supported by EU under the COST Program Action IC1305: Network for Sustainable Ultrascale Computing (NESUS). This work is partially supported by the grant TIN2013-41350-P, Scalable Data Management Techniques for High-End Computing Systems from the Spanish Ministry of Economy and Competitiveness

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Proceedings of the Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015) Krakow, Poland

Author: Carretero Pérez Jesús
García Blas Francisco Javier
Jeannot Emmanuel
Wyrzykowski Roman
Publication venue
Publication date: 01/10/2015
Field of study

Proceedings of: Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015). Krakow (Poland), September 10-11, 2015

Universidad Carlos III de Madrid e-Archivo

New directions in mobile, hybrid, and heterogeneous clouds for cyberinfrastructures

Author: Antoniu Gabriel
Carretero Pérez Jesús
García Blas Francisco Javier
Petcu Dana
Publication venue: 'Elsevier BV'
Publication date: 01/10/2018
Field of study

With the increasing availability of mobile devices and data generated by end-users, scientific instruments and simulations solving many of our most important scientific and engineering problems require innovative technical solutions. These solutions should provide the whole chain to process data and services from the mobile users to the cloud infrastructure, which must also integrate heterogeneous clouds to provide availability, scalability, and data privacy. This special issue presents the results of particular research works showing advances on mobile, hybrid, and heterogeneous clouds for modern cyberinfrastructures

Universidad Carlos III de Madrid e-Archivo

Proceedings of the First International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2014): Porto, Portugal

Author: Barbosa Jorge
Carretero Pérez Jesús
García Blas Francisco Javier
Morla Ricardo
Publication venue
Publication date: 01/01/2014
Field of study

Proceedings of: First International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2014). Porto (Portugal), August 27-28, 2014

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Boosting analyses in the life sciences via clusters, grids and clouds

Author: Carretero Pérez Jesús
García Blas Francisco Javier
Gesing Sandra
Montagnat Johan
Publication venue: 'Elsevier BV'
Publication date: 01/02/2017
Field of study

In the last 20 years, computational methods have become an important part of developing emerging technologies for the field of bioinformatics and biomedicine. Those methods rely heavily on large scale computational resources as they need to manage Tbytes or Pbytes of data with large-scale structural and functional relationships, TFlops or PFlops of computing power for simulating highly complex models, or many-task processes and workflows for processing and analyzing data. This special issue contains papers showing existing solutions and latest developments in Life Sciences and Computing Sciences to collaboratively explore new ideas and approaches to successfully apply distributed IT-systems in translational research, clinical intervention, and decision-making. (C) 2016 Published by Elsevier B.V

Universidad Carlos III de Madrid e-Archivo

Exposing data locality in HPC-based systems by using the HDFS backend

Author: Carretero Pérez Jesús
García Blas Francisco Javier
García Carballeira Félix
Rivadeneira López-Bravo José
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/12/2020
Field of study

This work was partially supported by the project “CABAHLA-CM: Convergencia Big data-Hpc: de los sensores a las Aplicaciones” S2018/TCS4423 from Madrid Regional Government and the European Union’s Horizon 2020 research, New Data Intensive Computing Methods for High-End and Edge Computing Platforms (DECIDE). Ref. PID2019-107858GB-I00 and innovation program under grant agreement No 801091, project “ÀSPIDE: Exascale programming models for extreme data processing”

Universidad Carlos III de Madrid e-Archivo

A GPU Accelerated High Performance Cloud Computing Infrastructure for Grid Computing Based Virtual Environmental Laboratory

Author: Florin Isaila
Francisco Javier García Blas
Giuliano Laccetti
Giulio Giunta
Raffaele Montella
Publication venue: 'IntechOpen'
Publication date: 01/01/2011
Field of study

IntechOpen

Archivio della ricerca - Università degli studi di Napoli "Parthenope"

Archivio della ricerca - Università degli studi di Napoli Federico II

Crossref

Exploiting stream parallelism of MRI reconstruction using GrPPI over multiple back-ends

Author: Carretero Pérez Jesús
García Blas Francisco Javier
García Sánchez José Daniel
Río Astorga David del
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Proceeding of: 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), Larnaca, Cyprus, 14-17 May 2019In recent years, on-line processing of data streams has been established as a major computing paradigm. This is due mainly to two reasons: first, more and more data are generated in near real-time that need to be processed; the second reason is given by the need of efficient parallel applications. However, the above-mentioned areas expose a tough challenge over traditional data-analysis techniques, which have been forced to evolve to a stream perspective. In this work we present an comparative study of a stream-aware multi-staged application, which has been implemented using GrPPI, a generic and reusable parallel pattern interface for C++ applications. We demonstrate the benefits of using this interface in terms of programability, performance, and scalability.This work was supported by the EU project “ASPIDE: Exascale Programing Models for Extreme Data Processing” under grant 80109

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Federated identity architecture of the european eID system

Author: Carretero Pérez Jesús
García Blas Francisco Javier
Izquierdo Moreno Guillermo
Vasile Cabezas Mario
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Federated identity management is a method that facilitates management of identity processes and policies among the collaborating entities without a centralized control. Nowadays, there are many federated identity solutions, however, most of them covers different aspects of the identification problem, solving in some cases specific problems. Thus, none of these initiatives has consolidated as a unique solution and surely it will remain like that in a near future. To assist users choosing a possible solution, we analyze different federated identify approaches, showing main features, and making a comparative study among them. The former problem is even worst when multiple organizations or countries already have legacy eID systems, as it is the case of Europe. In this paper, we also present the European eID solution, a purely federated identity system that aims to serve almost 500 million people and that could be extended in midterm also to eID companies. The system is now being deployed at the EU level and we present the basic architecture and evaluate its performance and scalability, showing that the solution is feasible from the point of view of performance while keeping security constrains in mind. The results show a good performance of the solution in local, organizational, and remote environments

Universidad Carlos III de Madrid e-Archivo